In-line mediation for manipulating three-dimensional content on a display device

ABSTRACT

A user holds the mobile device upright or sits in front of a nomadic or stationary device, views the monitor from a suitable distance, and physically reaches behind the device with her hand to manipulate a 3D object displayed on the monitor. The device functions as a 3D in-line mediator that provides visual coherency to the user when she reaches behind the device to use hand gestures and movements to manipulate a perceived object behind the device and sees that the 3D object on the display is being manipulated. The perceived object that the user manipulates behind the device with bare hands corresponds to the 3D object displayed on the device. The visual coherency arises from the alignment of the user&#39;s head or eyes, the device, and the 3D object. The user&#39;s hand may be represented as an image of the actual hand or as a virtualized representation of the hand, such as part of an avatar.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority of U.S. provisional patent applicationNo. 60/093,651 filed Sep. 2, 2008 entitled “GESTURE AND MOTION-BASEDNAVIGATION AND INTERACTION WITH THREE-DIMENSIONAL VIRTUAL CONTENT ON AMOBILE DEVICE,” of which is hereby incorporated by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to hardware and software foruser interaction with digital three-dimensional data. More specifically,it relates to devices having displays and to human interaction with datadisplayed on the devices.

2. Description of the Related Art

The amount of three-dimensional content available on the Internet and inother contexts is increasing at a rapid pace. Consumers are getting moreaccustomed to hearing about “3D” in various contexts, such as movies,video games, and online virtual cities. Three-dimensional content may befound in medical imaging (e.g., examining MRIs), modeling andprototyping, information visualization, architecture, tele-immersion andcollaboration, geographic information systems (e.g., Google Earth), andin other fields. Current systems, including computers and cell phones,but more generally, content display systems (e.g., TVs) fall short oftaking advantage of 3D content by not providing an immersive userexperience. For example, they do not provide an intuitive, natural andunobtrusive interaction with 3-D objects.

With respect to mobile devices, presently, such devices do not provideusers who are seeking interaction with digital 3D content on theirmobile devices with a natural, intuitive, and immersive experience.Mobile device users are not able to make gestures or manipulate 3Dobjects using bare hands in a natural and intuitive way.

Although some displays allow users to manipulate 3D content with barehands in front of the display (monitor), current display systems thatare able to provide some interaction with 3D content requireinconvenient or intrusive peripherals that make the experience unnaturalto the user. For example, some current methods of providing tactile orhaptic feedback require vibro-tactile gloves. In other examples, currentmethods of rendering 3-D content include stereoscopic displays(requiring the user to wear a pair of special glasses),auto-stereoscopic displays (based on lenticular lenses or parallaxbarriers that cause eye strain and headaches as usual side effects),head-mounted displays (requiring heavy head gear or goggles), andvolumetric displays, such as those based on oscillating mirrors orscreens (which do not allow bare hand direct manipulation of 3-Dcontent).

In addition mobile device displays, such as displays on cell phones,only allow for a limited field of view (FOV). This is due to the factthat the mobile device display size is generally limited by the size ofthe device. For example, the size of a non-projection display cannot belarger than the mobile device that contains the display. Therefore,existing solutions for mobile displays (which are generallylight-emitting displays) limit the immersive experience for the user.Furthermore, it is presently difficult to navigate through virtualworlds and 3-D content via a first-person view on mobile devices, whichis one aspect of creating an immersive experience. Mobile devices do notprovide satisfactory user awareness of virtual surroundings, anotherimportant aspect of creating an immersive experience.

Some display systems require a user to reach behind the monitor.However, in these systems the user's hands must physically touch theback of the monitor and is only intended to manipulate 2-D images, suchas moving images on the screen.

SUMMARY OF THE INVENTION

A user is able to use a mobile device having a display, such as a cellphone or a media player, to view and manipulate 3D content displayed onthe device by reaching behind the device and manipulating a perceived 3Dobject. The user's eyes, device, and a perceived 3D object are alignedor “in-line,” such that the device performs as a type of in-linemediator between the user and the perceived 3D object. This alignmentresults in a visual coherency to the user when reaching behind thedevice to make hand gestures and movements to manipulate the 3D content.That is, the user's hand movements behind the device are at a naturaland intuitive distance and are aligned with the 3D object displayed onthe device monitor so that the user has a natural visual impression thatshe is actually handling the 3D object shown on the monitor.

One embodiment of the present invention is a method of detectingmanipulation of a digital 3D object displayed on a device having a frontside with a display monitor facing the user and a back side having asensor facing away from the user. A hand or other object may be detectedwithin a specific area of the back side of the device having the sensor,such as a camera. The hand is displayed on the monitor and its movementswithin a specific area of the back side of the device are tracked. Themovements are the result of the user intending to manipulate thedisplayed 3D object and are made the user in manipulating a perceived 3Dobject behind the device, but without having to physically touch thebackside of the device. A collision between the displayed hand and thedisplayed 3D object may be detected by the device resulting in amodification of the image of the 3D object displayed on the device. Inthis manner the device functions as a 3D in-line mediator between theuser and the 3D object.

In another embodiment, a display device includes a processor and amemory component storing digital 3D content data. The device alsoincludes a tracking sensor component for tracking movement of an objectthat is in proximity of the device. In one embodiment, the trackingsensor component faces the back of the device (away from the user) andis able to detect movements and gestures of a hand of a user who reachesbehind the device. A hand tracking module processes movement data fromthe tracking sensor and a collision detection module detects collisionsbetween a user's hand and a 3D object.

BRIEF DESCRIPTION OF THE DRAWINGS

References are made to the accompanying drawings, which form a part ofthe description and in which are shown, by way of illustration,particular embodiments:

FIG. 1A is an illustration of a user using a mobile device as a 3Din-line mediator to manipulate digital 3D content displayed on thedevice in accordance with one embodiment;

FIG. 1B is an illustration of a user using a laptop computer as a 3Din-line mediator to manipulate digital 3D content displayed on device inaccordance with one embodiment;

FIG. 1C is an illustration of a user using a desktop computer as a 3Din-line mediator to manipulate digital 3D content displayed on a desktopmonitor in accordance with one embodiment;

FIGS. 2A and 2B are top views illustrating 3D in-line mediation;

FIG. 2C is an illustration of a side perspective of user shown in FIG.2A;

FIG. 3A is a more detailed top view of a user utilizing a mobile deviceas a 3D in-line mediator for manipulating digital 3D content inaccordance with one embodiment;

FIG. 3B shows a scene that a user sees when facing a device and whenreaching behind the device;

FIG. 4 is a flow diagram of a process of enabling in-line mediation inaccordance with one embodiment;

FIG. 5 is a block diagram showing relevant components of a devicecapable of functioning as a 3D in-line mediator in accordance with oneembodiment; and

FIGS. 6A and 6B illustrate a computer system suitable for implementingembodiments of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

Methods and systems for using a display device as a three-dimensional(3D) in-line mediator for interacting with digital 3D content displayedon the device are described in the various figures. The use of a displaydevice as an in-line mediator enables intuitive bare hand manipulationof digital 3D content by allowing a user to see the direct effect of theuser's handling of the 3D content on the display by reaching behind thedisplay device. In this manner, the device display functions as anin-line mediator between the user and the 3D content, enabling a type ofvisual coherency for the user. That is, the 3D content is visuallycoherent or in-line from the user's perspective. The user's hand, or arepresentation of it, is shown on the display, maintaining the visualcoherency. Furthermore, by reaching behind the device, the user's viewof the 3D content on the display is not obstructed by the user's arm orhand.

FIG. 1A is an illustration of a user using a mobile device as a 3Din-line mediator to manipulate digital 3D content displayed on thedevice in accordance with one embodiment. The term “3D in-linemediation” refers to the user's eyes, the 3D content on the display, anduser's hands behind the display (but not touching the back of thedisplay) being aligned in real 3D space. A user 102 holds a mobiledevice 104 with one hand 105 and reaches behind device 104 with anotherhand 106. One or more sensors (not shown), collectively referred to as asensor component, on device 104 face away from user 102 in the directionof user's hand behind device 104. User's hand 106 is detected and arepresentation 108 of the hand is displayed on a device monitor 109displaying a 3D scene 107. As discussed in greater detail below,representation 108 may be an unaltered (actual) image of the user's handthat is composited onto scene 106 or may be a one-to-one mapping of realhand 106 to a virtual representation (not shown), such as an avatar handwhich becomes part of 3D or virtual scene 107. The term “hand” as usedherein may include, in addition to the user's hand, fingers, and thumb,the user's wrist and forearm, all of which may be detected by the sensorcomponent.

Mobile device 104 may be a cell phone, a media player (e.g., MP3player), portable gaming device, or any type of smart handset devicehaving a display. It is assumed that the device is IP-enabled or capableof connecting to a suitable network to access 3D content over theInternet. However, the various embodiments described below do notnecessarily require that the device be able to access a network. Forexample, the 3D content displayed on the device may be resident on alocal storage of the device, such as on a hard disk or other massstorage component or on a local cache. The sensor component on mobiledevice 104 and the accompanying sensor software may be one or more ofvarious types of sensors, a typical one being a conventional camera.Implementations of various sensor components are described below.Although the methods and systems of the various embodiments aredescribed using a mobile device, they may equally apply to nomadicdevices, such as laptops and netbook computers (i.e., devices that areportable), and to stationary devices, such as desktop computers,workstations, and the like, as shown in FIGS. 1B and 1C.

FIG. 1B is an illustration of a user 110 using a laptop computer 112 orsimilar nomadic computing device (e.g., netbook computer, mini laptop,etc.) as a 3D in-line mediator to manipulate digital 3D contentdisplayed on device 112 in accordance with one embodiment. A laptopcomputer has a sensor component (not shown) facing away from user 110.Sensor component may be an internal component of laptop 112 or aperipheral 113 attached to it with associated software installed onlaptop 112. Both of user 110 hands 114 and 115 are composited onto a 3Dscene 116 on display 117.

Similarly, FIG. 1C is an illustration of a user 118 using a desktopcomputer monitor 119 as a 3D in-line mediator to manipulate digital 3Dcontent displayed on desktop monitor 119 in accordance with oneembodiment. As a practical matter, it is preferable that monitor 119 besome type of flat panel monitor, such as an LCD or plasma monitor, sothat the user is physically able to reach behind the monitor. A trackingsensor component 120 detects a user's hand 122 behind desktop monitor119. In this example, hand 122 is mapped to a digital representation,such as an avatar hand 124. User 118 may also move his other hand behindmonitor 119 as in FIG. 1B.

FIGS. 2A and 2B are top views illustrating 3D in-line mediation. Theyshow a user 200 holding a mobile device 202 with her left hand 204. User200 extends her right hand 206 behind device 202. An area 208 behindmobile device 202 indicated by the solid angled lines, is a so-calledvirtual 3D space and the area 210 surrounding device 202 is the physicalor real environment or world (“RW”) that user 200 is in. A segment ofthe user's hand 206 in virtual space 208 is shaded. A sensor component(not shown) detects the presence and movement in area 208. User 200 canmove hand 206 around in area 208 to manipulate 3D objects shown ondevice 202. FIG. 2B shows a user 212 reaching behind a laptop 214 withboth hands 216 and 218.

The sensor component, also referred to as a tracking component, may beimplemented using various types of sensors. These sensors may be used todetect the presence of a user's hand (or any object) behind the mobiledevice's monitor. In a preferred embodiment, a standard or conventionalmobile device or cell phone camera is used to sense the presence of ahand and its movements or gestures. Image differentiation or optic flowmay also be used to detect and track hand movement. In otherembodiments, a conventional camera may be replaced with infrareddetection components to perform hand detection and tracking. Forexample, a mobile device camera facing away from the user and that is IRsensitive (or has its IR filter removed), possibly in combination withadditional IR illumination (e.g., LED), may look for the brightestobject within the ranger of the camera, which will likely be the user'shand. Dedicated infrared sensors with IR illumination may also be used.In another embodiment, redshift thermal imaging may be used. This optionprovides passive optical components that redshift a standard CMOS imagerto be able to detect long wavelength and thermal infrared radiation.Another type of sensor may be ultrasonic gesture detection sensors.Sensor software options include off-the-shelf gesture recognition tools,such as software for detecting hands using object segmentation and/oroptic flow. Other options include spectral imaging software fordetecting skin tones, pseudo-thermal imaging, and 3D depth cameras usingtime-of-flight.

FIG. 2C is an illustration of a side perspective of user 200 showninitially in FIG. 2A. User hand 206 is in front of a sensor component ofdevice 202, the sensor facing away from user 200. User 200 holds device202 with hand 204. Hand 206 is behind device 202 in virtual 3D space208. Space 208 is the space in proximity of the backside of device 202that is tracked by the backward facing sensor. Real or physical world210 is outside the proximity or tracked area of device 202. Gestures andmovements of hand 206 in 3D space 208 are made by user 200 in order tomanipulate a perceived 3D object (not shown) on the display of device202. In another scenario, the gestures may not be for manipulating anobject but simply for the purpose of making a gesture (e.g., waving,pointing, etc.) in a virtual world environment.

FIG. 3A is a more detailed top view of a user utilizing a mobile deviceas a 3D in-line mediator for manipulating digital 3D content inaccordance with one embodiment. It shows a user's head 303, a“perceived” digital 3D object 304 and a display device 306 as beingin-line or aligned, with display device 306 functioning as the in-linemediator. It also shows a user's hand 308 reaching behind device 306.FIG. 3B shows a scene 310 that user head 303 sees when facing device 306and when reaching behind device 306. The user sees digital 3D object 312on the screen and user hand 308 or a representation of it touching orotherwise manipulating object 312. It is helpful to note that althoughthe figures only show a user touching an object or making gestures, theuser is also able to manipulate a digital 3D object, such as touching,lifting, holding, moving, pushing, pulling, dropping, throwing,rotating, deforming, bending, stretching, compressing, squeezing,pinching. When the user reaches behind the monitor or device, she canmove her hand(s) to manipulate 3D objects she sees on the display. Asexplained below, there is a depth component. If the 3D object is infront of the 3D scene she is viewing, she will not have to reach farbehind the device/monitor. If the object is further back in the 3Dscene, the user may have to physically reach further behind thedevice/monitor in order to touch (or collide with) the 3D object.

FIG. 4 is a flow diagram of a process of enabling in-line mediation inaccordance with one embodiment. The process described in FIG. 4 beginsafter the user has powered on the device, which may be mobile, nomadic,or stationary. A tracking component has also been activated. The devicedisplays 3D content, such as an online virtual world or any other formof 3D content, examples of which are provided above. The user directlyfaces the display; that is, sits squarely in front of the laptop ordesktop monitor or holds the cell phone directly in front of her. Theremay be a 3D object displayed on the screen that the user wants tomanipulate (e.g., pick up a ball, move a chair, etc.) or there may be a3D world scene in which the user wants to perform a hand gesture ormovement (e.g., wave to 3D person or an avatar). Other examples that donot involve online 3D content may include moving or changing theorientation of 3D medical imaging data, playing a 3D video game,interacting with 3D content, such as a movie or show, and so on.

The user begins by moving a hand behind the device (hereafter, for easeof illustration, the term “device” may convey mobile device screens andlaptop/desktop monitors). At step 402 a tracking component detects thepresence of the user's hand. There are various ways this can be done.One conventional way is by detecting the skin tone of the user's hand.As described above, there are numerous types of tracking components orsensors that may be used. Which one that is most suitable will likelydepend on the features and capabilities of the device (i.e., mobile,nomadic, stationary, etc.). A typical cell phone camera is capable ofdetecting the presence of a human hand. An image of the hand (or hands)is transmitted to a compositing component.

At step 404 the hand is displayed on the screen. The user sees either anunaltered view of her hand (not including the background behind andaround the hand) or an altered representation of the hand. If an imageof the user's hand is displayed, known compositing techniques may beused. For example, some techniques may involve combining two videosources-one for the 3D content and another representing video images ofthe user's hand. Other techniques for overlaying or compositing theimages of the hand over the 3D content data may be used and whichtechnique is most suitable will likely depend on the type of device. Ifthe user's hand is mapped to an avatar hand or other digitalrepresentation, software from the 3D content provider or otherconventional software may be used to perform a mapping of the user handimages to an avatar image, such as a robotic hand. Thus, after step 404,a representation of a stationary user's hand can be seen on the device.That is, its presence has been detected and is being represented on thedevice.

At step 406 the user starts moving the hand, either by moving it up,down, left, right, or inward or outward (relative to the device) or bygesturing (or both). The initial position of the hand and its subsequentmovement can be described in terms of x, y, and z coordinates. Thetracking component begins tracking hand movement and gesturing, whichhas horizontal, vertical, and depth components. For example, a user maybe viewing a 3D virtual world room on the device and wants to move anobject that is in the far left corner of the room (which has a certaindepth) and to the near right corner of the room. In one embodiment ofthe invention, the user may have to move his hand to a position that is,for example, about 12 inches behind and slightly left of the device.This may require that the user extend her arm out a little further thanwhat would be considered a normal or natural distance. After grabbingthe object, as discussed in step 408 below, the user moves her hand to aposition that is maybe 2-3 inches behind and to the right of the device.This example illustrates that there is a depth component in the handtracking that is implemented to maintain the in-line mediation performedby the device.

At step 408 the digital representation of the user's hand on the devicecollides or touches an object. This collision is detected by comparingsensor data from the tracking sensor and geometrical data from the 3Ddata repository. The user moves her hand behind the device in a way thatcauses the digital representation of her hand on the screen to collidewith the object, at which point she can grab, pick up, or otherwisemanipulate the object. The user's hand may be characterized as collidingwith the perceived object that is “floating” behind the device, asdescribed in FIG. 3A. In the described embodiment, in order to maintainthe 3D in-line mediation or visual coherency, the user's eyes arelooking straight at the middle of the screen. That is, there is avertical and horizontal alignment of the user's head with the device andthe 3D content. In another embodiment, the user's face may also betracked which may enable changes in the 3D content images to reflectmovement in the user's head (i.e., perspective).

In one embodiment, an “input-output coincidence” model is used to closea human-computer interaction feature referred to as a perception-actionloop, where perception is what the user sees and action is what the userdoes. This enables a user to see the consequences of an interaction,such as touching a 3-D object, immediately. As described above, a user'shand is aligned with or in the same position as the 3-D object that isbeing manipulated. That is, from the user's perspective, the hand isaligned with the 3-D object so that it looks like the user is lifting ormoving a 3-D object as if it were a physical object. What the user seesmakes sense based on the action being taken by the user. In oneembodiment, the system provides tactile feedback to the user upondetecting a collision between the user's hand and the 3-D object.

At step 410 the image of the 3D scene is modified to reflect the user'smanipulation of the 3D object. If there is no manipulation of a 3Dobject (and thus no object collision), the image on the screen changesas the user moves her hand, as it does when the user manipulates a 3Dobject. The changes in the 3D image on the screen may be done usingknown methods for processing 3D content data. These methods ortechniques may vary depending on the type of device, the source of thedata, and other factors. The process then repeats by returning to step402 where the presence of the user's hand is again detected. The processdescribed in FIG. 4 is continuous in that the user's hand movement istracked as long as it is within the range of the tracking component. Inthe described embodiment, the device is able to perform as a 3D in-linemediator as long as the user's head or perspective is kept in line withthe device which, in turn, allows the user's hand movements behind thedevice to be visually coherent with the hand movements shown on thescreen and vice versa. That is, the user moves her hand in the physicalworld based on actions she wants to perform in the digital 3Denvironment shown on the screen.

FIG. 5 is a block diagram showing relevant components of a devicecapable of functioning as a 3D in-line mediator in accordance with oneembodiment. Many of the components shown here have been described above.A device 500 has a display component (not shown) for displaying digital3D content data 501 which may be stored in mass storage or in a localcache (not shown) on device 500 or may be downloaded from the Internetor from another source. A tracking sensor component 502 may include oneor more conventional (2D) cameras and 3D (depth) cameras and non-cameraperipherals. A 3-D camera may provide depth data which simplifiesgesture recognition by use of depth keying. In another embodiment, awide angle lens may be used in a camera which may require lessprocessing by an imaging system, but may produce more distortion.Component 502 may also have other capabilities as described above, suchas infrared detection, optic flow, image differentiation, redshiftthermal imaging, spectral processing, and other techniques may be usedin tracking component 502. Tracking sensor component 502 is responsiblefor tracking the position of body parts within the range of detection.This position data is transmitted to hand tracking module 504 and toface tracking module 506 and each identifies features that are relevantto each module.

Hand tracking module 504 identifies features of the user's handpositions, including the positions of the fingers, wrist, and arm. Itdetermines the location of these body parts in the 3D environment. Datafrom module 504 goes to two components related to hand and arm position:gesture detection module 508 and hand collision detection module 510. Inone embodiment, a user “gesture” results in a modification of 3D content501. A gesture may include lifting, holding, squeezing, pinching, orrotating a 3D object. These actions typically result in some type ofmodification of the object in the 3D environment. A modification of anobject may include a change in its location (lifting or turning) withoutthere being an actual deformation or change in shape of the object. Thegesture detection data may be applied directly to the graphics datarepresenting 3D content 501.

In another embodiment, tracking sensor component 502 may also track theuser's face. In this case, face tracking data is transmitted to facetracking module 506. Face tracking may be utilized in cases where theuser is not vertically aligned (i.e., the user's head is not lookingdirectly at the middle of the screen) with the device and the perceivedobject.

In another embodiment, data from hand collision detection module 510 maybe transmitted to a tactile feedback controller 512, which is connectedto one or more actuators 514 which are external to device 500. In thisembodiment, the user may receive haptic feedback when the user's handcollides with a 3D object. Generally, it is preferred that actuators 514be as unobtrusive as possible. In one embodiment, they are vibratingwristbands, which may be wired or wireless. Using wristbands allows forbare hand manipulation of 3D content as described above. Tactilefeedback controller 512 receives a signal that there is a collision orcontact and causes tactile actuators 514 to provide a physical sensationto the user. For example, with vibrating wristbands, the user's wristwill sense a vibration or similar physical sensation indicating contactwith the 3-D object.

As is evident from the figures and the various embodiments, the presentinvention enables a user to interact with digital 3D content in anatural and immersive way by enabling visual coherency, thereby creatingan immersive volumetric interaction with the 3D content. In oneembodiment, a user uploads or executes 3D content onto a mobilecomputing device, such as a cell phone. This 3D content may be a virtualworld that the user has visited using a browser on the mobile device(e.g., Second Life or any other site that provides virtual worldcontent). Other examples include movies, video games, online virtualcities, medical imaging (e.g., examining MRIs), modeling andprototyping, information visualization, architecture, tele-immersion andcollaboration, and geographic information systems (e.g., Google Earth).The user holds the display of the device upright at a comfortabledistance in front of the user's eyes, for example at 20-30 centimeters.The display of the mobile device is used as a window into the virtualworld. Using the mobile device as an in-line mediator between the userand the user's hand, the user is able to manipulate 3D objects shown onthe display by reaching behind the display of the device and make handgestures and movements around a perceived object behind the display. Theuser sees the gestures and movements on the display and the 3D objectthat they are affecting.

As discussed above, one aspect of creating an immersive and natural userinteraction with 3D content using a mobile device is enabling the userto have bare-hand interaction with objects in the virtual world. Thatis, allowing the user to manipulate and “touch” digital 3D objects usingthe mobile device and not requiring the user to use any peripheraldevices, such as gloves, finger sensors, motion detectors, and the like.

FIGS. 6A and 6B illustrate a computing system 600 suitable forimplementing embodiments of the present invention. FIG. 6A shows onepossible physical form of the computing system. Of course, the computingsystem may have many physical forms including an integrated circuit, aprinted circuit board, a small handheld device (such as a mobiletelephone, handset or PDA), a personal computer or a super computer.Computing system 600 includes a monitor 602, a display 604, a housing606, a disk drive 608, a keyboard 610 and a mouse 612. Disk 614 is acomputer-readable medium used to transfer data to and from computersystem 600.

FIG. 6B is an example of a block diagram for computing system 600.Attached to system bus 620 are a wide variety of subsystems.Processor(s) 622 (also referred to as central processing units, or CPUs)are coupled to storage devices including memory 624. Memory 624 includesrandom access memory (RAM) and read-only memory (ROM). As is well knownin the art, ROM acts to transfer data and instructions uni-directionallyto the CPU and RAM is used typically to transfer data and instructionsin a bi-directional manner. Both of these types of memories may includeany suitable of the computer-readable media described below. A fixeddisk 626 is also coupled bi-directionally to CPU 622; it providesadditional data storage capacity and may also include any of thecomputer-readable media described below. Fixed disk 626 may be used tostore programs, data and the like and is typically a secondary storagemedium (such as a hard disk) that is slower than primary storage. Itwill be appreciated that the information retained within fixed disk 626,may, in appropriate cases, be incorporated in standard fashion asvirtual memory in memory 624. Removable disk 614 may take the form ofany of the computer-readable media described below.

CPU 622 is also coupled to a variety of input/output devices such asdisplay 604, keyboard 610, mouse 612 and speakers 630. In general, aninput/output device may be any of: video displays, track balls, mice,keyboards, microphones, touch-sensitive displays, transducer cardreaders, magnetic or paper tape readers, tablets, styluses, voice orhandwriting recognizers, biometrics readers, or other computers. CPU 622optionally may be coupled to another computer or telecommunicationsnetwork using network interface 640. With such a network interface, itis contemplated that the CPU might receive information from the network,or might output information to the network in the course of performingthe above-described method steps. Furthermore, method embodiments of thepresent invention may execute solely upon CPU 622 or may execute over anetwork such as the Internet in conjunction with a remote CPU thatshares a portion of the processing.

Although illustrative embodiments and applications of this invention areshown and described herein, many variations and modifications arepossible which remain within the concept, scope, and spirit of theinvention, and these variations would become clear to those of ordinaryskill in the art after perusal of this application. Accordingly, theembodiments described are illustrative and not restrictive, and theinvention is not to be limited to the details given herein, but may bemodified within the scope and equivalents of the appended claims.

1. A method of detecting manipulation of a digital 3D object displayedon a device having a front side with a display and a back side, themethod comprising: detecting a hand within a specific area of the backside of the device, the back side having an sensor; displaying the handon the display; tracking movement of the hand within the specific areaof the back side, wherein said movement is caused by a user intending tomanipulate the displayed 3D object; detecting a collision between thedisplayed hand and the displayed 3D object; and modifying an image ofthe 3D object displayed on the device, wherein the device is a 3Din-line mediator between the user and the 3D object.
 2. A method asrecited in claim 1 further comprising: detecting a hand gesture withinthe specific area.
 3. A method as recited in claim 1 wherein modifyingan image of the 3D object further comprises: deforming the image of the3D object.
 4. A method as recited in claim 1 wherein modifying an imageof the 3D object further comprises: moving the image of the 3D object.5. A method as recited in claim 1 further comprising: displaying themodified image on the device.
 6. A method as recited in claim 1 whereinthe user reaches behind the device to manipulate a perceived objectcorresponding to the 3D object, such that the hand is within thespecific area of the back side of the device.
 7. A method as recited inclaim 6 further comprising: providing the user with visual coherencywhen the user reaches behind the device.
 8. A method as recited in claim1 wherein tracking movement of the hand further comprises: processingdepth data of said hand movement.
 9. A method as recited in claim 1further comprising: executing tracking software.
 10. A method as recitedin claim 1 wherein the sensor is a tracking component that faces outwardfrom the back side of the device and wherein the sensor is a camera. 11.A method as recited in claim 1 wherein displaying the hand furthercomprising: displaying a composited image of the hand on the display.12. A method as recited in claim 1 wherein displaying the hand furthercomprising: displaying a virtualized image of the hand on the display.13. A method as recited in claim 1 further comprising: providing hapticfeedback to the hand when a collision is detected between the displayedhand and the 3D object.
 14. A method as recited in claim 1 wherein thereis no contact between the hand and the back side of the device or withthe display.
 15. A device having a display, the device comprising: aprocessor; a memory storing digital 3D content data; a tracking sensorcomponent for tracking movement of an object in proximity of the device,wherein the tracking sensor component is on a back side of the devicefacing away from a user; a hand tracking module for processing movementdata related to a user hand; and a hand-3D object collision module fordetecting a collision between the user hand and a 3D object.
 16. Adevice as recited in claim 15 further comprising: a face tracking sensorcomponent for tracking face movement in proximity of a front side of adevice; and a face tracking module for processing face movement datarelated to user face movement in front of the device.
 17. A device asrecited in claim 15 further comprising: a hand gesture detection modulefor detecting user hand gestures made within range of the trackingsensor component.
 18. A device as recited in claim 15 furthercomprising: a tactile feedback controller for providing tactile feedbackto the user hand.
 19. A device as recited in claim 15 wherein a trackingsensor component is a camera-based component.
 20. A device as recited inclaim 15 wherein a tracking sensor component is one of an imagedifferentiator, infrared detector, optic flow component, and spectralprocessor.
 21. A device as recited in claim 15 wherein the trackingsensor component tracks movements of the hand when the user moves thehand behind the device with the range of the tracking sensor component.22. A device as recited in claim 15 wherein the device is one of amobile device, a nomadic device, and a stationary device.
 23. A deviceas recited in claim 15 further comprising a network interface forconnecting to a network to receive digital 3D content data.
 24. Anapparatus for manipulating digital 3D content, the apparatus having afront side with a display and a back side, the apparatus comprising:means for detecting a hand within a specific area of the back side ofthe apparatus, the back side having an sensor; means for displaying thehand on the apparatus; means for tracking movement of the hand withinthe specific area of the back side, wherein said movement is caused by auser intending to manipulate a displayed 3D object; means for detectinga collision between the displayed hand and the displayed 3D object; andmeans for modifying an image of the 3D object displayed on theapparatus, wherein the apparatus is a 3D in-line mediator between theuser and the 3D object.
 25. An apparatus as recited in claim 24 furthercomprising: means for detecting a hand gesture within the specific area.26. An apparatus as recited in claim 24 wherein means for modifying animage of the 3D object further comprises: means for moving the image ofthe 3D object.
 27. An apparatus as recited in claim 24 furthercomprising: means for displaying the modified image on the apparatus.28. An apparatus as recited in claim 24 wherein the user reaches behindthe apparatus to manipulate a perceived object corresponding to the 3Dobject, such that the hand is within the specific area of the back sideof the apparatus.
 29. An apparatus as recited in claim 24 wherein meansfor tracking movement of the hand further comprises: means forprocessing depth data of said hand movement.
 30. An apparatus as recitedin claim 24 wherein the sensor is a tracking component that facesoutward from the back side of the apparatus and wherein the sensor is acamera.
 31. An apparatus as recited in claim 24 wherein means fordisplaying the hand further comprises: means for displaying a compositedimage of the hand on the apparatus.
 32. An apparatus as recited in claim24 further comprising: means for providing haptic feedback to the handwhen a collision is detected between the displayed hand and the 3Dobject.